Utility modules for lists and maps #18

sungshik · 2024-09-10T08:19:06Z

This is a small refactoring PR in anticipation of the next feature PR (too many generic utility functions started to get scattered across the code base and were awkward to reuse from other modules, which we'll need next). Specifically, this PR:

Adds new modules MapUtil and ListUtil
Adds an explicit type somewhere to help the type checker (unrelated to the new modules, but ok to fix in this PR too)
Removes some obsolete code that's no longer used (unrelated to the new modules, but ok to fix in this PR too)

Moving code to new places aside (and rewriting a few comments), none of the moved code has been changed.

sungshik · 2024-09-10T08:21:12Z

rascal-textmate-core/src/main/rascal/lang/textmate/Grammar.rsc

-@synopsis{
-    Adds a TextMate rule to both the repository and the patterns of TextMate
-    grammar `g`
-}
-
-TmGrammar addRule(TmGrammar g, TmRule r)
-    = g [repository = g.repository + (r.name: r)]
-        [patterns = appendIfAbsent(g.patterns, include("#<r.name>"))];
-
-// TODO: This function could be moved to a separate, generic module
-private list[&T] appendIfAbsent(list[&T] vs, &T v)
-    = v in vs ? vs : vs + v;
-


This code has become unused and is just removed

DavyLandman · 2024-09-10T11:16:45Z

rascal-textmate-core/src/main/rascal/util/ListUtil.rsc

+}
+
+bool isStrictPrefix(list[&T] l1, list[&T] l2)
+    = size(l1) < size(l2) && !any(i <- [0..size(l1)], l1[i] != l2[i]);


ps, in the challenge:

bool isStrictPrefix(list[&T] l1, list[&T] l2) = size(l1) < size(l2) && l1 == l2[..size(l2);

is even shorter ;)

Or even this ;)

bool isStrictPrefix(list[&T] l1:[_,*_], [*l1, _, *_]) = true; default bool isStrictPrefix(_, _) = false;

That's brilliant, I concede 😉

I got a bit interested about performance of the various implementations (not really relevant for the current use case; just out of curiosity) and took a few anecdotal measurements. For posterity, here it goes...

Implementations

bool isStrictPrefix1([], []) = false; bool isStrictPrefix1([], [_, *_]) = true; bool isStrictPrefix1([_, *_], []) = false; bool isStrictPrefix1([head1, *tail1], [head2, *tail2]) = head1 == head2 && isStrictPrefix1(tail1, tail2); // -------------------------------------------------------------------------- // bool isStrictPrefix2(list[&T] l1, list[&T] l2) { if (size(l1) >= size(l2)) { return false; } if (i <- [0..size(l1)], l1[i] != l2[i]) { return false; } return true; } // -------------------------------------------------------------------------- // bool isStrictPrefix3(list[&T] l1, list[&T] l2) = size(l1) < size(l2) && !any(i <- [0..size(l1)], l1[i] != l2[i]); // -------------------------------------------------------------------------- // bool isStrictPrefix4(list[&T] l1, list[&T] l2) = size(l1) < size(l2) && l1 == l2[..size(l1)]; // -------------------------------------------------------------------------- // bool isStrictPrefix5(list[&T] l1:[_,*_], [*l1, _, *_]) = true; default bool isStrictPrefix5(_, _) = false;

Measurements

import IO; import util::Benchmark; void main(int i = 0) { cases = ( "v1": void() { for (_ <- [0..1000]) { isStrictPrefix1([0..i], [0..1000]); } }, "v2": void() { for (_ <- [0..1000]) { isStrictPrefix2([0..i], [0..1000]); } }, "v3": void() { for (_ <- [0..1000]) { isStrictPrefix3([0..i], [0..1000]); } }, "v4": void() { for (_ <- [0..1000]) { isStrictPrefix4([0..i], [0..1000]); } }, "v5": void() { for (_ <- [0..1000]) { isStrictPrefix5([0..i], [0..1000]); } } ); println(benchmark(cases)); }

Quiz

Question: Which implementation is slowest?

Answer...

isStrictPrefix1 is always slowest. For small prefixes, all other implementations are comparably fast. For larger prefixes, isStrictPrefix2 gets increasingly slower than ...3/4/5 (but never as slow as ...1).

Question: Which implementation is fastest?

Answer...

isStrictPrefix3/4/5 are comparably fast for small prefixes and non-prefixes. For larger prefixes, ...4/5 are clearly fastest (but there is no clear winner between the two).

Question: Which implementation crashes, if any?

Answer...

isStrictPrefix1 crashes with a java.lang.StackOverflowError for a list of size 1000 and a prefix of size 999 (not necessarily a minimal example).

Results

First series

With isStrictPrefix1:

rascal>main(i = 9) ("v1":404,"v2":304,"v3":322,"v4":312,"v5":306) ok rascal>main(i = 99) ("v1":1461,"v2":663,"v3":413,"v4":288,"v5":320) ok rascal>main(i = 999) java.lang.StackOverflowError

Second series

Without isStrictPrefix1:

rascal>main(i = 9) ("v2":319,"v3":387,"v4":344,"v5":374) ok rascal>main(i = 99) ("v2":594,"v3":450,"v4":332,"v5":421) ok rascal>main(i = 999) ("v2":18232,"v3":1713,"v4":577,"v5":520) ok rascal>main(i = 1000) ("v2":524,"v3":526,"v4":517,"v5":513) ok rascal>main(i = 9999) ("v2":3021,"v3":3003,"v4":3098,"v5":2975) ok

Micro-benchmarks are hard ;) these are doing to much in the loop that is not part of the focus.

On my machine, I get these numbers:

rascal>main_old(); Reloading module Bench ("v2":213,"v3":171,"v4":176,"v5":218)

and after rewriting the benchmark:

void main(int i = 0) { full = [0..1000]; needle = [0..i]; rounds = [0..1000]; cases = ( "v2": void() { for (_ <- rounds) { isStrictPrefix2(needle, full); } }, "v3": void() { for (_ <- rounds) { isStrictPrefix3(needle, full); } }, "v4": void() { for (_ <- rounds) { isStrictPrefix4(needle, full); } }, "v5": void() { for (_ <- rounds) { isStrictPrefix5(needle, full); } } ); println(benchmark(cases)); }

it prints:

Reloading module Bench ("v2":71,"v3":47,"v4":37,"v5":24)

suddenly v5 is the fastest, and the numbers are all quite a bit lower, let's increase the rounds to 100000.

rascal>main() ("v2":908,"v3":852,"v4":839,"v5":187)

so yeah, v5 might be a quite a bit faster than the rest of them ;)

Thank you, that's interesting! 🙂

Although I would have kept V4 for readability ;)

sungshik added 4 commits September 10, 2024 10:07

Create new module MapUtil and move existing code there

453ebcb

Create new module ListUtil and move existing code there

d8e8f13

Add explicit type to help type checker

9c77dff

Remove unused code

6b204ad

sungshik commented Sep 10, 2024

View reviewed changes

sungshik marked this pull request as ready for review September 10, 2024 08:22

DavyLandman reviewed Sep 10, 2024

View reviewed changes

sungshik added 3 commits September 10, 2024 15:18

Improve isStrictPrefix (shorter and faster!)

9aa5552

Fix typo

5514a53

Remove unused imports

b0a2942

sungshik merged commit 899fb3d into main Sep 11, 2024
2 checks passed

sungshik deleted the utility-modules-for-lists-and-maps branch September 11, 2024 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utility modules for lists and maps #18

Utility modules for lists and maps #18

sungshik commented Sep 10, 2024 •

edited

Loading

sungshik Sep 10, 2024

DavyLandman Sep 10, 2024

sungshik Sep 10, 2024

sungshik Sep 10, 2024 •

edited

Loading

DavyLandman Sep 10, 2024 •

edited

Loading

sungshik Sep 11, 2024

DavyLandman Sep 11, 2024

Utility modules for lists and maps #18

Utility modules for lists and maps #18

Conversation

sungshik commented Sep 10, 2024 • edited Loading

sungshik Sep 10, 2024

Choose a reason for hiding this comment

DavyLandman Sep 10, 2024

Choose a reason for hiding this comment

sungshik Sep 10, 2024

Choose a reason for hiding this comment

sungshik Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Implementations

Measurements

Quiz

Question: Which implementation is slowest?

Question: Which implementation is fastest?

Question: Which implementation crashes, if any?

Results

DavyLandman Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

sungshik Sep 11, 2024

Choose a reason for hiding this comment

DavyLandman Sep 11, 2024

Choose a reason for hiding this comment

sungshik commented Sep 10, 2024 •

edited

Loading

sungshik Sep 10, 2024 •

edited

Loading

DavyLandman Sep 10, 2024 •

edited

Loading